Feature selection in high dimensional data

from sklearn.feature_selection import f_regression import numpy as np from sklearn import svm from sklearn import linear_model import svmcrossvalidate from array import array Main #f = open(“testdata1.txt”) f = open(“testdata.txt”) mylist = f.readlines() testdata = [] for i in range(0, len(mylist), 1): l = mylist[i].split() for j in range(0, len(l), 1):... [Read More]

Categorizing News Articles with Natural Language Processing

Introduction In this post, I’ll summarize the exploratory data analyses I performed, explain the feature engineering and reduction steps I utilized, and present my final models to classify news articles. Proposition The goal of this project was to classify news articles into business/technology, entertainment, opinion, politics, science/health, sports, or world... [Read More]
Tags: NLP, pipeline, sentiment analysis, named entity recognition, Naive Bayes, webscraping, feature selection